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Abstract 

Background: The plant-pathogenic fungus Fusarium oxysporum f.sp. lycopersici {Fol) has accessory, lineage-specific 
(LS) chromosomes that can be transferred horizontally between strains. A single LS chromosome in the Fol4287 
reference strain harbors all known Fol effector genes. Transfer of this pathogenicity chromosome confers virulence 
to a previously non-pathogenic recipient strain. We hypothesize that expression and evolution of effector genes is 
influenced by their genomic context. 

Results: To gain a better understanding of the genomic context of the effector genes, we manually curated the 
annotated genes on the pathogenicity chromosome and identified and classified transposable elements. Both 
retro- and DNA transposons are present with no particular overrepresented class. Retrotransposons appear evenly 
distributed over the chromosome, while DNA transposons tend to concentrate in large chromosomal subregions. In 
general, genes on the pathogenicity chromosome are dispersed within the repeat landscape. Effector genes are 
present within subregions enriched for DNA transposons. A miniature Impala (mimp) is always present in their 
promoters. Although promoter deletion studies of two effector gene loci did not reveal a direct function of the 
mimp for gene expression, we were able to use proximity to a mimp as a criterion to identify new effector gene 
candidates. Through xylem sap proteomics we confirmed that several of these candidates encode proteins secreted 
during plant infection. 

Conclusions: Effector genes in Fol reside in characteristic subregions on a pathogenicity chromosome. Their 
genomic context allowed us to develop a method for the successful identification of novel effector genes. Since 
our approach is not based on effector gene similarity, but on unique genomic features, it can easily be extended to 
identify effector genes in Fo strains with different host specificities. 



Background 

The tomato pathogenic fungus Fusarium oxysporum 
forma specialis lycopersisci {Fol) posses a two-partite gen- 
ome. Eleven of the 15 chromosomes of the sequenced 
strain (Fol4287) are syntenic with chromosomes of the 
sister species Fusarium verticilloides and the more dis- 
tantly related Fusarium graminearum, displaying high se- 
quence similarity and conservation of gene order [1]. 
These core chromosomes contain all housekeeping genes 
and few transposable elements (TEs). Additionally, 
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Fol4287 possesses four chromosomes that are devoid of 
housekeeping genes and accommodate 74% of the whole 
genome TE content and 95% of the class II TEs (DNA 
transposons). The four chromosomes and two smaller 
regions at the ends of two core chromosomes comprise 
the lineage-specific (LS) part of the Fol genome. The 
genes encoded in LS regions differ in their phylogenetic 
history from the genes on the core chromosomes [1,2]. 
The term lineage-specific (LS) reflects the largely clonal 
structure of the Fo species complex. Fo reproduces 
asexually and consists of many clonal lineages, which, if 
pathogenic, are grouped into host-specific formae 
speciales (ff. spp.) [3]. While some ff. spp. are monophy- 
letic, others are composed of several clonal lineages that 
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appear to have independently acquired the ability to in- 
fect the same host plant [4-6]. This polyphyletic origin 
was likely caused by horizontal transfer of chromosomes 
encoding host specific virulence genes between Fo 
lineages, thereby allowing the distinction of members of 
a f. sp., not by overall genetic relatedness, but by the 
presence or absence of certain LS chromosomes [1]. 

In Fol, one LS chromosome (chromosome 14 of 
Fol4287) largely defines the pathogenic phenotype of this 
f. sp., i.e. the ability to cause wilt disease in tomato. 
Horizontal transfer of this pathogenicity chromosome 
from a tomato pathogenic isolate to a non-pathogenic 
isolate during co-cultivation resulted in novel tomato- 
pathogenic lineages, demonstrating that it contains 
genes that promote infection of tomato [1]. Among 
these genes are all known Fol4287 effector genes called 
SIX (Secreted In Xylem) genes. Like many other plant 
pathogens, Fol utilizes small, secreted proteins to pro- 
mote virulence by manipulating its plant host and 
suppressing host defense responses, typically through 
interaction with host proteins [7,8]. Six proteins are 
small, commonly cysteine-rich, lack homology to other 
proteins and have a signal peptide for secretion [8]. Six 
of the seven previously described Six proteins are 
encoded on the pathogenicity chromosome; the genomic 
location of SIX4, whose gene product is recognized by 
the tomato resistance proteins I and 1-1, is unknown be- 
cause it is not present in the sequenced race 2 isolate 
Fol4287. Although SIX genes were likely acquired by 
horizontal transfer of the pathogenicity chromosome, 
they are not functionally independent of the core gen- 
ome. Their expression requires the transcription factor 
Sgel (SIX gene expression 1), which is encoded on a 
core chromosome [9]. It is unknown whether Sgel 
regulates SIX gene expression directly or indirectly, 
for example through the action of other transcription 
factors. 

Effector genes in other plant pathogens, such as 
Magnoporthe oryzae, Leptosphaeria maculans or Phytho- 
phthora infestans, are also found proximal to TEs and 
TEs have been proposed as the underlying agents that 
provide a plastic environment for the emergence of new 
virulence traits [10-12]. The potential of TEs to affect 
genome structure is a consequence of both their mobil- 
ity and their inherent structure. Generally, two different 
TE classes are distinguished by their transposition 
intermediate: RNA or DNA. Class I TEs (or retro- 
transposons) transpose via a "copy-paste mechanism" by 
copying themselves into an RNA-intermediate before 
inserting at a new site, while class II TEs (or DNA 
transposons) leave the donor site to reintegrate at an- 
other site via a "cut-paste mechanism", although the ori- 
ginal copy can also be retained [13]. Class I TEs are 
either flanked by terminal inverted repeats (TIRs), long 



terminal repeats (LTRs) or simple non-coding regions. 
Class II TEs are usually flanked by TIRs [14]. Special 
TE families are the MITEs (Miniature Inverted-repeat 
Tranposable Elements), non-autonomous class II TEs of 
short length, which are thought to have evolved from 
autonomous TEs by deletion of their transposase ORF 
[15]. Recombination between identical or highly similar 
TEs can cause structural rearrangements like deletions, 
inversions, duplications and translocations depending on 
the orientation and genomic location of the recombining 
TE members [16]. For an asexual fungus like Fol, TE- 
mediated recombination might represent a mechanism 
to create genetic variation in the absence of meiotic re- 
combination. Next to gross structural rearrangements, 
TEs also contribute to evolution of novel phenotypes by 
transposition into new sites. For example, insertion of 
the hAT transposase Drifter into the coding sequence of 
an ancestral SIX1 homolog (SIX1-H) disrupted the open- 
reading frame (ORFs) of SIX1-H, thus creating an ef- 
fector pseudogene [8], In another case, insertion of a 
Hornet-like transposon at the SIX4 locus of a Japanese 
race 3 Fol isolate created a fusion protein, which was no 
longer recognized by the corresponding 1-1 tomato re- 
sistance protein [17]. TE insertion might also influence 
gene expression when it occurs within a promoter. 

To further our understanding of the molecular basis of 
pathogenicity of Fol towards tomato, we conducted a 
detailed annotation of the predicted proteins encoded by 
the Fol pathogenicity chromosome. In addition, to ad- 
vance our understanding of the potential role of the gen- 
omic context of effector genes in gene evolution or 
expression, we also annotated TEs on this chromosome. 
We thus obtained a detailed picture of the genomic 
landscape of the pathogenicity chromosome. Within this 
TE-rich landscape, we recognized mini-clusters of SIX 
genes. SIX genes are associated with two MITEs: a mimp 
upstream in all cases and, frequently, an mFot5 down- 
stream. Using promoter deletions at two SIX gene loci, 
we studied the influence of the mimp on SIX gene ex- 
pression. Finally, we were able to exploit the consistent 
presence of a mimp in the promoters of SIX genes and 
other virulence-associated genes to develop a method to 
identify candidate effector genes in F. oxysporum. 

Results 

Non-TE genes on the Fol pathogenicity chromosome 
group into a small set of functional classes 

Non-TE ORFs occupy only 13% of the DNA space on 
the pathogenicity chromosome of Fol, which consists of 
four supercontigs (sc) (sc 22, 36, 43, 51) in the most re- 
cent Fol genome assembly (Li-Jun Ma, personal commu- 
nication, Table 1). Most of the manually curated 245 
non-TE ORFs on this chromosome encode proteins of 
unknown function, which are annotated as hypothetical 
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Table 1 Space occupied by TEs and non-TE ORFs on the 
pathogenicity chromosome 





In bp 


Percent of sequence 


non-TE ORFs 


324923 


13 


TEs 


581563 


24 


total 


2457923 


100 



proteins or proteins with domains of unknown function 
(140 ORFs, Figure 1). Some of these unknown proteins 
have homologous sequences in F. oxysporum or in other 
fungi (Additional file 1). Two functional groups stand 
out among the predicted products of the remaining 103 
non-TE ORFs: secreted proteins (29) and proteins 
involved in secondary metabolism (35). Other functional 
groups include transcription factors (11), proteins with 
nucleic acid related functions (10), heterokaryon incom- 
patibility (Het) proteins (4), transporters (3), cyclins (3) 
and other intracellular functions (17), such as GTPases 
and protein kinases (Figure 1). As reported by Ma et al., 
there are no genes for housekeeping proteins on the 
pathogenicity chromosome [1]. Among the predicted 
secreted proteins, we find nine secreted enzymes, such 
as oxidoreductase, chitinase and glucanase, and 20 
secreted proteins of unknown function. Sixteen of the 
latter encode proteins smaller than 300 amino acids. 
Among those are the previously described effector genes 
SIX1, SIX2, SIX3, SIX5, SIX6 and SIX7 [18-20]. Proteins 



encoded on the Fol pathogenicity chromosome that are 
likely involved in secondary metabolism [1] include 
methyl transferases (6), cytochrome P450s (6) and 
glycosyltransferases (3). A putative secondary metabolite 
gene cluster on sc51 includes genes for three cyto- 
chrome P450s, a glycosyltransferase, a methyltransferase, 
a squalene-hopene cyclase and a homolog of Tri7, an 
acetyltransferase that is part of the trichothecene gene 
cluster in Gibberella zeae [21]. The genes in this putative 
secondary metabolite cluster are expressed during to- 
mato infection (Additional file 2) and might therefore be 
important for pathogenicity of the fungus. 

Currently, it is not known how F. oxysporum can 
transfer chromosomes horizontally from one strain to 
another. One hypothesis is that horizontal chromo- 
some transfer (HCT) occurs via anastomosis tubes - 
specialized, unbranched tubes that connect conidia or 
hyphae [2,22]. Anastomosis tubes result in heterokaryon 
formation between two fungal individuals [23]. This 
heterokaryon is only viable if the individuals have the 
same HET (Heterokaryon incompatibility) genotype; 
otherwise it undergoes a characteristic cell death reac- 
tion called an incompatibility reaction [24]. Four genes 
on the pathogenicity chromosome encode proteins with 
similarity to Het proteins in other fungi (FOXG_- 
14188, FOXG_14292, FOXG_14283, FOXG_14284). 
Het proteins like Het-E from Podospora anserina often 
harbor NACHT (NAIP, CIIA, HET-E, TP1) domains, or 



OSecreted protein (< 300 aa) 
OSecreted proteins (> 300 aa) 
OSecreted enzymes 




■ TEs 

Unknown function 
Other functions 

■ Transcription factors 
Nucleic acid-related functions 

■ Vegetative incompatibility-like 

■ Secondary metabolism 
Secreted proteins 



Figure 1 TEs dominate on the pathogenicity chromosome. TEs and non-TE genes are presented as percentage of the total TE/gene content. 
Genes coding for secreted proteins (including SIX genes) constitute one of the best-represented classes. 
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a highly divergent nucleoside phosphorylase (Pfs) linked 
with protein-binding modules such as Ankyrin repeats 
[25]. FOXG_14188 encodes a protein with a NACHT 
domain, FOXG_14292 a protein with a Pfs domain and 
Ankyrin repeats, FOXG_14283 a protein with a Pfs, an 
ATPase and Ankyrin repeats and FOXG_14284 a protein 
with Pfs and Ankyrin repeats (Additional file 1). 
The presence of HET-Vke genes on the pathogenicity 
chromosome may be seen to contradict HCT via anasto- 
mosis tubes, because additional HET loci would raise 
the chance of incompatibility between strains, involving 
programmed cell death of fused compartments. On the 
other hand, incompatibility does not appear to be a bar- 
rier to HCT [1]. Moreover, we do not know whether the 
HET-like genes on the pathogenicity chromosome are 
really involved in incompatibility. 

Since transfer of the pathogenicity chromosome is 
sufficient to confer pathogenicity towards tomato, the 
virulence genes on it must be expressed in the new host 
strain. We know that there is crosstalk between the 
core and pathogenicity chromosome, because the core 
chromosome-encoded Sgel controls SIX gene expres- 
sion [26]. The presence of eleven genes encoding tran- 
scription factors on the pathogenicity chromosome 
suggests that transcription of genes on the pathogenicity 
chromosome may also be controlled by the chromosome 
itself. Among the transcription factors encoded on the 
pathogenicity chromosome are three copies of FTF1, 
which is induced upon plant infection [27], suggesting 
that at least a subset of the transcription factors encoded 
on the pathogenicity chromosome may be required for 
transcriptional reprogramming during plant infection. 

Next to transcription factors, nine other genes encode 
proteins with nucleic acid-related functions (Figure 1). 
Most of these proteins are predicted to function in 
structural rearrangements of DNA or in chromatin 
modifications. FOXG_16427 encodes a poly(ADP)-ribose 
polymerase (Parpl) which binds to damaged or single- 
stranded DNA to recruit DNA repairing enzymes [28]. 
Other genes encode putative components of the RNA 
silencing machinery, including closely spaced genes 
for an RNA-dependent RNA polymerase (FOXG_16453), 
an RNA interference and gene silencing protein 
(FOXG_16455) and a RNaseH domain-containing pro- 
tein (FOXG_16456). FOXG_14161 encodes a protein 
homologous to the eukaryotic conserved kinetochore 
protein Misl2 that is involved correct segregation of 
daughter chromatids during mitosis and meiosis [29]. 
FOXG_14165 encodes a protein with a BAH (bromo- 
adjacent homology) domain which may interact with 
gene silencing components [30]. Similarly, FOXG_14186 
encodes a chromodomain protein that typically recruits 
protein complexes to chromatin and reads the epigenetic 
code by recognizing lysine methylation [31]. Proteins 



involved in chromatin modification and RNA interference 
might influence gene expression during pathogenicity. 

The Fol pathogenicity chromosome harbours a large 
diversity of transposable elements 

To exhaustively identify TEs and TE relics on the 
Fol pathogenicity chromosome, we performed a self- 
BLASTN of the genome sequence, then identified multi- 
copy sequences and sorted them into non-redundant 
families. Secondly, we looked for inverted repeats (IRs) 
of at least 19 bp encompassing at most 5 kb of sequence. 
This expanded the set of identified TEs relative to an 
initial survey [1]. Taken together, TEs occupy about 
twice as much (24%) chromosomal DNA space as non- 
TE ORFs (13%, Table 1). 

Both Class I and Class II TEs (full length and fragments) 
are present in approximately equal numbers (266 Class I, 
249 class II, Table 2) on the pathogenicity chromosome, 
which is surprising because retrotransposons often dom- 
inate the TE fraction of a given genome [11,32-34]. For 
annotation of the TE classes we followed the classification 
system proposed by Wicker and colleagues that comprises 
both mechanistic and enzymatic criteria [14]. Class I TEs 
all transpose by transcribing themselves into an RNA 
intermediate, then reverse-transcribing the RNA by a TE- 
encoded reverse transcriptase and inserting into a new 
genomic region. There are three orders of class I TEs: 
Long-terminal-repeat (LTR) TEs, long-interspersed nu- 
clear elements (LINE) and short interspersed nuclear 
elements (SINE). 

LTR retrotransposons are similar to retroviruses and 
encode multiple enzymatic domains including Gag 
(a viral coat protein), protease, RNaseH, reverse tran- 
scriptase and integrase, flanked by long terminal repeats 
[14]. Within the LTR order we identified members of 
the Gypsy/Ty3 (27) and Copia/Tyl (59) superfamilies, a 
novel class I TE named Yaret2, which encodes integrase 
(IPR001584), RNaseH (IPR012337), reverse transcriptase 
(IPR013103) and a Zinc-finger (IPR001878) domain, as 
well as two novel solo-LTR families. Solo-LTRs can be 
the result of intrachromosomal or intraelement recom- 
bination between the LTRs, thereby removing the in- 
ternal domains and creating a solo LTR at the excision 
site [35]. Several of these LTR transposons have been 
previously recognized in Fo or in other pathogenic fungi. 
Nht2, for example, is also present on a LS chromosome 
of Fusarium solani [36]. 

LINE elements lack the LTRs that are characteristic 
for the retroviral-like class I TEs. In this order we identi- 
fied 31 MGR583-like elements and 34 Yaretl and 
Yaretl-like elements (24 and 10, respectively). MGR583 
accompanies the effector gene AVR-PITA in some M. 
oryzae isolates [12]. The latter two are novel LINEs. 
Foxy (32 copies) represents the only TE of the SINE 



Table 2 Transposable elements on the Fol4287 pathogenicity chromosome 
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Classification 



Designation 



Number 



Order 



Superfamily 



Family 



Number 



Full length 
number 



Class I (retrotransposon) 

LTR Gypsy/Ty3 

Copia/Ty1 

unclassified 
solo-LTR 

LINE 



SINE 

unrelated 

Class II (DNA transposons) - Subclass 1 

Crypton 

TIR Tcl/mariner 



237 

24 



55 



20 



32 
10 
208 



Pogo 



Tel 
hAT 



3 
70 



MAGGY-like retrotransposon (3 types) 
Skippy 

NHT2-like retrotransposon (5 types) 
Pcretro3-like retrotransposon 
Yaret2 

Yaret2 solo-LTR 

Gollum (NHT2-like retrotransposon type 3 LTR) 

MGR583-like LINE element 

Yaretl 

Ya ret Mike 

Foxy 

Marsu 

FoCrypton 

Fot2 

Fot3 

Fot4 

Fot5 

Fot6 

Fot8 

Impala 

Folytl 

Folyt2 

Frodo 

Hornet 

Drifter 

NhORF4-like 

Sam 

YahATI 

YahAT2 

YahAT3 



237 

16 

8 

51 
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20 
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16 

31 

25 
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Table 2 Transposable elements on the Fol4287 pathogenicity chromosome (Continued) 
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designations in bold letters indicate TEs that have been described in Fusarium oxysporum before. 
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class on the pathogenicity chromosome. Foxy appears to 
be an active TE that is specific for Fusarium species 
[37]. Foxy elements are the most abundant class I 
TEs in Fol and they are evenly distributed over the 
pathogenicity chromosome and also throughout core 
chromosomes [1,38] (this manuscript). This dispersed 
distribution pattern is also apparent for the other class I 
TEs. Finally, we detected 10 copies of Marsu, which is a 
retrotransposon that cannot be classified as LTR, LINE 
or SINE. Copies of Marsu were first described in Fo f. 
sp. phaseoli where they were found downstream of the 
FTF1 gene [27]. Ramos et al. speculated that the Marsu 
element might be responsible for gene duplication 
events of FTF1 [27]. For most retrotransposon classes 
on the pathogenicity chromosome, we find only few full- 
length copies. Marsu is the marked exception: seven of 
the ten copies are full-length. Marsu copies are present 
in other Fol4287 LS regions, and two copies reside on 
core chromosomes. Although we did not detect identical 
copies within the genome sequence of Fol4287, the 
presence of moderately divergent copies and many 
full-length copies suggest that Marsu elements have been 
active relatively recently. 



Compared to class I elements, class II elements are 
less evenly distributed on the chromosome and many 
aggregate in large chromosomal subregions (Figure 2). 
Class II elements are divided into two subclasses. 
Among subclass I we identified one Crypton copy. 
Cryptons encode a tyrosine-recombinase to cut and re- 
join recombining DNA strands. They were first identi- 
fied in human pathogenic fungi and were later found to 
be domesticated in vertebrates [39,40]. There are more 
Crypton copies present on other LS chromosomes, but 
none on core chromosomes. Within subclass II we iden- 
tified nine Helitron copies. Helitrons are unusual class II 
TE; instead of a 'cut and paste' mechanism they trans- 
pose via a rolling-circle mechanism [14]. With this 
transposition mechanism they often capture host genes 
and thus contribute to genome evolution [15]. At least 
eight of the nine Helitron copies on the pathogenicity 
chromosome are intact; one is truncated by a sequence 
gap (Additional file 1). All copies are 99-100% identical 
in sequence, and there are intact Helitron copies on core 
chromosomes, suggesting that Helitrons are still active. 

The best-represented order of class II TEs are the Ter- 
minal Inverted Repeat (TIR) TEs (Table 2). These TEs 



sc2.22 



SC2.43 



SC2.51 



SC2.36 



Class 
SIX genes 
Class I TEs 
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Figure 2 SIX genes reside in class II TE-enriched chromosomal subregions. TE densities and SIX gene locations were displayed in the IGV 
Genome Browser. Supercontigs are ordered according to their position in the optical map of chromosome 14, ignoring gaps between them. The 
positions of the SIX genes are indicated by stars. Numbers above the enlarged windows refer to position (kb) in the respective supercontig (sc). 
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consists of a transposase ORF flanked by TIRs [14]. 
Among the TCI/mariner superfamily, we found mul- 
tiple, diverse Fot lineages belonging to the pogo family. 
This finding confirms the previously shown preferred 
localization of pogo elements on LS chromosomes [41]. 
We observed a similar diversification of Hop elements 
belonging to the Mutator family. Five Hop classes are 
present with one to 13 copies, most of which are not 
full-length, although Hop has been shown to be active in 
Fo [42]. Most TE families, including three Folyt copies 
and 16 Hornet copies, belong to the hAT family. Folyt 
has been identified as an expressed and active transpos- 
able element in Fol by transposon trapping [43]. Hornetl 
was discovered during analysis of transposons in Fo f. sp. 
melonis [44]. The only copy of the hAT transposon 
Drifter adjoins the truncated effector gene SIX1-H [45]. 
Overall, as previously shown for some genomic regions 
in Fom, class II TEs seem to preferentially insert into 
or close to each other, creating class II TE-enriched 
subregions on the Fol pathogenicity chromosome. 

These subregions are also enriched for MITEs. MITEs 
are non-autonomous TEs, which basically consist of 
TIRs flanking a short non-coding DNA sequence. Three 
different classes of MITEs are present on the pathogen- 
icity chromosome: 55 mimps (miniature Impalas), three 
Gimlis and 14 mFot5s (of which one is interrupted 
by a retrotransposon). MITEs require an associated 
transposase for transposition. Often, this associated 
transposase has similar TIRs [15]. For mFot5 transpos- 
ition, two TEs encoding intact Fot5 transposases on the 
pathogenicity chromosome might facilitate transposition. 
Mimps are transposed by the Impala transposase, which 
was shown to be active in the melon pathogenic strain 
Fo f. sp. melonis by transposon tagging [46,47]. However, 
in Fol4287 all three Impala copies, which reside on the 
pathogenicity chromosome, do not encode a full-length 
transposase, suggesting that mimps are presently not ac- 
tively transposed in Fol4287. The large diversification of 
the mimp lineages with members of more than four 
families and without two identical copies also suggests 
that mimps are not presently active in Fol4287. 

Mimps are associated with promoters of SIX genes 

SIX genes tend to reside in chromosomal subregions 
that are enriched for class II TEs, sometimes as mini- 
clusters (Figure 2, Additional file 1). For example, SIX1 
and SIX2 form a mini-cluster with one intervening gene 
(salicylate hydrolase homolog (SSH1)) and two interven- 
ing mimps, flanked by another mimp and a Fot5 
(Figure 3, see below). SIX3 and SIXS form another mini- 
cluster with an intervening mimp, with nearby mFot5 
and Fot5 fragments. This mini-cluster is flanked on both 
sides by inverted repeats, suggesting that this mini- 



cluster might be able to be transposed (Figure 3, 
Additional file 1). 

A closer inspection of the SIX gene promoters, which 
we pragmatically define as 1500 bp upstream of the 
start codon, revealed the presence of a mimp in the 
promoters of SIX1, SIX2, SIX3, SIX5, SIX6 and SIX7 
(Figure 4). The mimp in the SIX1 locus was revealed by 
re-sequencing, because in the Fol4287 genome assembly 
there is a sequence gap upstream of the SIX1 ORF. An- 
other sequence gap separates a mimp from SIX7. We 
were not able to bridge this gap by PCR and therefore 
cannot rule out that the distance between the mimp and 
SIX7 is bigger than 1.5 kb or that there is another mimp 
present that is closer to the SIX7 start codon. The 
avirulence gene SIX4/AVR1 of race 1 Fol strains, which 
is not present in Fol 4287 (race 2), also harbors a mimp 
in its promoter sequence (Figure 4). The pathogenicity 
chromosome harbors more than half of the mimps 
present in the Fol4287 genome (Table 3). The other 
copies are mainly present on the three other LS 
chromosomes with the exception of four mimps on core 
chromosomes, as observed before [48]. Only a subset of 
the mimps on the pathogenicity chromosome is present 
in putative promoters (i.e. within 1500 bp of a predicted 
start codon). While SIX1-7 all harbor a mimp in their 
promoter, only 8.3% of all annotated non-TE ORFs on 
the pathogenicity chromosome do so. This association of 
mimps with SIX gene promoters is highly significant 
(chi-square test p = 5.25E-16 for association by chance of 
mimps with the six known SIX genes on the Fol4287 
pathogenicity chromosome). Additional annotated ORFs 
with a mimp in the promoter region encode a bZIP tran- 
scription factor, an integral membrane protein, an alpha- 
N glucosaminetransferase, the Ftfl transcription factor 
(2 copies), a catalase-peroxidase, the oxidoreductase 
Orxl, a homolog of the Verticillium dahliae avirulence 
protein Avel, a methyltransferase, a cytochrome P450 
and a squalene-hopene cyclase. The latter three genes 
belong to the putative secondary metabolite cluster that 
is co-expressed during plant infection (see above). Like- 
wise, FTF1 has previously been shown to be expressed 
during plant infection [27]. The catalase-peroxidase and 
Orxl are secreted in the xylem sap of FoZ-infected to- 
mato plants [19] (this manuscript). Overall, therefore, 
mimps seem to be preferentially associated with the 
promoters of genes that are expressed during plant 
infection. 

To see whether additional, potentially regulatory elements 
may be enriched in SIX gene promoters, we analyzed the 
promoter sequences of SIX I, SIX2, SIX3, SIX5, SIX6 and 
SIX7 for enriched k-mers. Several overlapping 6 to 9mers 
were significandy enriched within these promoters. The 
most frequent of these form the sequence TCGGCAGTT 
(see Methods for details). Perfect matches to this sequence 
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are present in the SIX1 and SIX3/SIX5 promoters. Com- 
pared to the entire gene set of the Fol4287 genome, the 
association between the presence of at least one or two of 
the 6mers TCGGCA, GGCAGT and the 7mer GGCAGTT 
and the 1000 bp upstream region of effector genes appears 
to be significant (Additional file 3). 

Finally, we also examined the 1500 bp downstream of 
the STOP codons of SIX genes. mFot5 is present down- 
stream of SIX2, SIX4, SIX5 and SIX7 (Figure 4). The as- 
sociation of this MITE with the SIX genes is weaker 
than the mimp association with the SIX gene promoters, 
because it is not present downstream of all the SIX genes 
on the pathogenicity chromosome. 

SIX1 gene expression is not dependent on the presence 
of a mimp in the promoter 

We next wanted to know whether the mimp or the putative 
regulatory elements enriched in the SIX gene promoters 



are directiy involved in transcriptional regulation of the SIX 
genes. To test this, we designed two constructs to replace 
different parts of the SIX1 promoter with a hygromycin re- 
sistance cassette. Both deletion constructs included the 
mimp, the difference between the constructs being that the 
SIXlpll89 construct (1552 to 363bp upstream of the trans- 
lation start site) deletes only three of the six conserved SIX 
gene promoter 12 mers, while the SIXlpl230 construct 
(-1552 bp to -323 bp) deletes five of these 12 mers 
(Figure 5A). 

First, we tested whether SIX1 was still expressed in the 
promoter deletion strains in vitro. Most SIX genes are 
not highly expressed in vitro, their expression is only 
switched on upon plant infection. However, a low 
amount of SIX1 transcript is detectable in vitro [26]. To 
our surprise, SIX1 was expressed in both SIXlpll89 and 
SIX3pl230 promoter deletion strains despite the absence 
of a large part of the SIX1 upstream region (Figure 5B). 
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Sbcl is recognized by the tomato resistance protein 1-3 
and triggers disease resistance in tomato plants carrying 
the 7-3-gene, thereby prohibiting extensive fungal infec- 
tion [49]. Upon plant infection, SIX1 was only expressed 
from strains with the shorter SIXlpll89 deletion in 
both susceptible and resistant tomato cultivars. All 
transformed strains remained pathogenic towards to- 
mato without Fol resistance genes, indicating that they 
were not affected in pathogenicity (Figure 5D). Consist- 
ent with the in planta expression pattern, only the wild 
type and strains with the SIXlpll89 promoter deletion 
were avirulent on the resistant tomato cultivar. In con- 
trast, 1-3 tomato cultivars that were infected with Fol 
strains carrying the SIX3pl230 promoter deletion were 
diseased, indicating the absence or reduced accumula- 
tion of the Sixl avirulence protein (Figure 5D). Taken 
together, deletion of the mimp did not impair SIX1 ex- 
pression in vitro or in in planta and this mimp is there- 
fore not required for transcriptional regulation of the 
SIX1 gene. However, a promoter region including two 
TCGGCA elements appears to be required for SIX1 ex- 
pression during plant infection. 

SIX3/SIX5 promoter deletions reveal complex regulation 
at this locus 

To further investigate the functional role of mimps in ef- 
fector gene expression, we also designed promoter 



deletion constructs for the SIX3-SIX5 locus. SIX3 and 
SIXS share the same 1365 bp upstream sequence. This 
bidirectional promoter allowed us to test the expression 
of two different SIX genes with the same promoter 
deletion constructs. Like SIX1, SIX3 is also recognized 
by a tomato resistance protein, 1-2 in this case, and 
expression of SIXS and SIXS is low but detectable 
in vitro [18,26]. We designed three promoter deletion 
constructs: SIX3p539 (1095 to 520 bp upstream of the 
transcription start site), SIX3p807 (-1095 to -252 bp) 
and SIX3p859 (-1059 to -200 bp). SIX3p539 deletes six 
of the nine TCGGCA elements, but does not include the 
mimp, SIX3p807 includes the six TCGGCA elements 
and the mimp and SIX3p859 additionally deletes one 
more TCGGCA element (Figure 6A). Again, none of 
these promoter deletions impairs expression of SIX3 or 
SIXS in vitro (Figure 6B). During plant infection, a 
reduced level of SIX3 mRNA was detected in Fol strains 
carrying the SIX3p539 deletion, but not in strains with 



Table 3 Distribution of mimps in the Fol4287 genome 





Number of mimps 


Mimp per Mb 


pathogenicity chromosome 


54 


21,14 


other LS chromosomes 


45 


3,26 


core chromosomes 


4 


0,09 
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Figure 5 Deletion of the mimp in the SIX! promoter does not impair SIX! expression, but a small region with a conserved motif is 
required for SIX1 expression during plant infection. (A) Schematic representation of the SIXI locus. Black lines: deleted promoter fragments 
(deletion length in bp); pink box: mimp; yellow arrow: SIX gene; orange circles: sequence matching AAGTCGGCAGTT[AG] motif enriched in 51X1-7 
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resistant cultivar, because it is not pathogenic (see D). A black box marks interactions where recognition of Sixl by 1-3 is broken or where no 
disease is caused. Error bars indicate the 95% confidence interval of the mean. 



the SIX3p807 deletion. SIXS is not expressed in either 
SIX3p539 or SIX3p807 deletion strains (Figure 6C). Re- 
markably, both SIX3 and SIX5 are expressed during 
plant infection in Fol strains carrying the most extensive 
promoter deletion, SIX3p859 (Figure 6C). With one ex- 
ception, all tested strains were still able to cause disease 
on susceptible tomato cultivars and are thus not gener- 
ally impaired in pathogenicity (Figure 6D). Only strains 
with the SIX3p859 deletion trigger a resistance response 
in tomato plants carrying the 1-2 resistance gene, while 
the Fol strains with the SIX3p539and SIX3p807 pro- 
moter deletions break I-2-mediated resistance, con- 



sistent with the Six3 protein not being produced by 
these strains (Figure 6C). Although in the Fol strains 
carrying the SIX3p539 promoter deletion a residual 
amount of SIX3 transcript is present, these strains are 
virulent. This may be explained by the additional re- 
quirement of SIX5 for 7-2-mediated resistance (manu- 
script in preparation). 

From this set of experiments in two SIX gene loci, we 
can conclude that the mimps are not required for regu- 
lation of SIX gene expression. On the other hand, dele- 
tion of a short region containing a single TCGGCA 
element in the promoter of SIXI abolishes SIXI 
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Figure 6 Deletion of the mimp in the shared promoter region of SIX3 and SIX5 does not affect expression of the two genes. 
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expression suggesting that this motif might represent a 
transcription factor-binding site (Figure 5C). In contrast, 
at the SIX3-SIXS locus additional deletion of a region 
containing the same motif restores expression of both 
SIX3 and SIX5 during plant infection (Figure 6C). 

The presence of mimps in the promoters of SIX genes 
enables prediction of novel effector candidates 

Next, we wanted to test whether we can use the consist- 
ent presence of a mimp in the upstream region of the 
SIX genes to predict novel effector candidates. We 
searched the Fol4287 genome for the presence of a 
mimp TIR within 2 kb upstream of an ORF encoding a 
protein with an N-terminal signal peptide for secretion 
(as defined by SignalP). We also analyzed the xylem sap 
proteome of FoZ-infected tomato plants by mass 



spectrometry to see which of the predicted effectors are 
secreted by the fungus during plant infection. 

By the in silico search for mimp-association we 
predicted 16 effector genes in Fol4287, which are located 
on chromosomes 3, 6 and 14. These include three of the 
known SIX genes on the pathogenicity chromosome: 
57X2, SIX3, SIX6. SIX1 and SIX7 were not identified be- 
cause of sequencing gaps in the Fol4287 genome assem- 
bly (see above). 57X5 is a small gene comprising three 
exons. The first exon is unusually short and ends dir- 
ectly after the encoded signal peptide for secretion. 
Therefore, 57X5 escaped signal peptide prediction (by 
SignalP) and thus was not identified with our approach. 

Besides the known 57X genes, we identified nine 
genes coding for small secreted proteins and four genes 
coding for secreted enzymes with a mimp in the upstream 
region (Table 4). The latter comprise several multi-copy 



Table 4 Novel effector candidates identified by searching for genes with a mimp IR in their promoter 
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Gene description 



Encoded protein 



Chromosome FOXG or genomic location 



Mimp in promoter 1 

Identified in Distance mimp IR-ATG 



search 



[bp] 



SP* Protein 
in 

xylem 
sap 



SIX genes 

Six1 (corrected) 14 

Six2 14 

Six3 14 

Six5 14 

Six6 14 

Six7 14 
Novel effector candidates 

Six8 14, 14 

Six8b 3, 3, 6, 6 

Six9 14 

Six10 14 

Six11 14 

Six12 14 

Sixl 3 6,6 

Six14 14 

FoAvel 14 

conserved secreted 14 
protein 

secreted protein 15 

secreted protein 14 
Secreted enzymes 

Orx1 14 

catalase-peroxidase 6, 14 

metalloprotease 3, 6 



FOXGJ 641 8 (incorrect) no 3 

FOXGJ6416 yes 

FOXGJ 6398 yes 

SG36[3273-3407] no 4 

FOXGJ 4246 yes 

SG51 [6521 6-65875] no 

FOXGJ 7445, FOXGJ 6464 no 4 

SCI 8[1 1 22404-1 22824],SC1 8[862700-863 1 20], SC41 [221648-222068], SC21 [219855- yes 
220275] 

FOXGJ 4223 yes 

FOXGJ 7457 no 4 

SC22[806692-807024] yes 

SC51 [6241 5-62753] no 5 

FOXGJ7131 (5' extended), SC42[1 26863-1 271 92] yes 

SC36[1 35867-1 361 80] yes 

SC36[201 730-2021 01] yes 

FOXGJ 4254 yes 

SC38[202206-202388] yes 

SC51 [127492-1 28836] yes 

FOXGJ 4258;FOXG J 4236 yes 

FOXGJ 7 1 30, FOXGJ 7460 yes 

SC47[78991 -79260], SC42[41 025-41 294] yes 



1192 

211 

232 

1132 

668 

(sequence gap) 
109 

1026, 1972 

249 
384 

322, 852 

837 

1971 

211, 258, 681 
788 
1312 

1717 
1236 

554 
921 
394 



1215 



yes yes 

yes yes 

yes yes 

yes yes 

yes yes 

yes yes 

yes yes 

yes no 

yes yes 

yes yes 

yes yes 

no yes 

yes yes 

yes yes 

yes no 

yes no 

yes no 

yes no 

yes yes 

yes yes 

yes no 



1 distance between the mimp IR and the ATG start codon. 

2 predicted signal peptide. 

3 not identified because of a sequence gap in the genome assembly. 

4 not identified because of a short first exon. 

5 not identified due to absence of a signal peptide. 
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genes in the Fol4287 genome: two (non-identical) 
0RX1 copies, two copies of a gene coding for a catalase- 
peroxidase, two copies of a gene coding for a metallo- 
protease and three copies of a gene coding for an 
endo-polygalacturonase (Table 4). Both Orxl and the cata- 
lase-peroxidase proteins were identified with mass spec- 
trometry in the xylem sap of Fo/-infected tomato plants, 
suggesting that they may play a role during plant infection. 

Next to these two enzymes, we obtained protein 
sequences for four of the nine predicted effector proteins 
from the xylem sap proteome. Additionally, we identified 
three more small proteins in the xylem sap of infected to- 
mato plants that were not predicted by our in silico search. 
We named the genes for which we found the protein 
products in xylem sap SIX8 - SIX14; one additional gene 
we named SIX8b for its high similarity to SIX8 (Table 4). 
Upon inspection of the regions upstream of their respect- 
ive genes we could always identify a mimp. SIX8, SIX10 
and SIX12 were not found with the in silico search be- 
cause no signal peptide was detected. Similar to SIX5, 
SIX8 and SIX10 have a short first exon and therefore the 
signal peptide was not recognized by SignalP. SIX12 is an 
unusual effector gene: it does not encode a protein with a 
canonical signal peptide for secretion. 

In contrast to the other SIX genes in Fol, SIX8 is not a 
single gene, but is present in two copies on the pathogen- 
icity chromosome (sc36 and sc51) and in subtelomeric 
regions on chromosomes 2, 3 and 7 in a repeated block of 
around 7400 bp. This block includes incomplete copies of 
the class II TEs Marsu and YahAT7, a Foxy and a gene en- 
coding an unknown protein (Figure 3, Additional file 1). 
The repeated sequences flanking the SIX8 genomic block 
on sc36 suggest that SIX8 is present in a subtelomeric re- 
gion. Furthermore, two copies of a related gene, SIX8b, 
are present on chromosomes 3 and 6 each in the i-b/4287 
genome. In total, there are nine SIX8 and four SIX8b cop- 
ies in the Fol4287 genome sequence. Both SIX8 and SIX8b 
appear to be preceded by a complex structure of (partial) 
mimps and mimp IRs (Additional file 4). 

Like the SIX1-7 genes described above, the newly iden- 
tified SIX genes, as well as several additional potential 
effector genes for which we did not find evidence for ex- 
pression in planta, reside in class II TE-rich subregions 
(Figure 2, Table 4, Additional file 1). SIX11 resides in a 
region that includes SIX6, three genes coding for 
conserved secreted proteins, one gene for a MFS trans- 
porter and one for a fumarate reductase/succinate de- 
hydrogenase, a FTF1 homolog and the ORX1 gene. 
SIX14 is part of a cluster containing SIX1, SSH1 and 
SIX2 (Figure 3). SIX10 and SIX12 make up a mini-cluster 
with SIX7. Similar to the SIX3/SIXS mini-cluster, SIX12 
is flanked on both sides by inverted repeats, suggesting 
that it may be mobilized by a transposase that 
recognizes these IRs. SIX13 (FOXG_17131 - 5' extended) 



is part of a duplicated region on chromosome 6 (sc 42), 
which is different from the interchromosomal duplica- 
tion shared with chromosome 3. 

Taken together, we have developed a method to pre- 
dict novel effector genes in genomes of F. oxysporum 
based only on the following characteristics: (1) coding 
for small, secreted proteins, (2) harboring mimps or 
inverted repeats of mimps within 2 kb upstream of the 
start codon. We validated this method by mass spectro- 
metric analysis of the xylem sap of Fo/-infected tomato 
plants and confirmed in planta secretion of several 
predicted novel candidate effectors. These novel SIX 
genes represent ideal candidates for functional analysis. 

Discussion 

Effector genes on the Fol pathogenicity chromosome are 
associated with chromosomal subregions enriched in 
class II transposable elements 

TEs dominate the Fol pathogenicity chromosome with 
large aggregates of class II TEs and more evenly 
distributed class I TEs. Interspersed within this TE-rich 
landscape are mostly single non-TE ORFs, a putative 
secondary metabolite cluster and the SIX gene mini- 
clusters. In many plant and fungal species with expanded 
genomes, retrotransposons are mainly responsible for 
genome expansion. Their mode of replication, which 
involves creating new copies during every transposition 
cycle, can rapidly increase genome size. Often, a single 
or few class I TEs account for the majority of TEs 
present in a genome. The maize genome, for example, 
consists of 76% class I TEs, with the Gypsy family elem- 
ent huck and the Copia element ji together accounting 
for nearly one quarter of the genome sequence [33]. 
Similarly, in the obligate fungal pathogen Blumeria 
graminis f. sp. hordei, the class I TE I (Line/Sine) alone 
occupies 17.2% of the entire genome space [32]. On the 
Fol pathogenicity chromosome, we do not observe such 
a massive expansion of retrotransposons. Instead, large 
aggregates of class II TEs are associated with genes 
involved in pathogenicity, such as the SIX gene mini- 
clusters. The tendency of class II TEs to concentrate in 
subchromosomal regions might result from recombin- 
ation of their IRs with IRs of the same or a similar TE 
family. Occasionally, SIX genes might be trapped be- 
tween the IRs and subsequently transposed together 
with the TE, resulting in the observed presence of 
SIX genes within class II TE-enriched chromosomal 
subregions. Support for this hypothesis stems from the 
observation that IRs directly flank SIX12 and the SIX3/ 
SIXS mini-cluster, although the transposase recognizing 
these IRs remains unknown. Similarly, the highly dy- 
namic genomic location of the small, subtelomeric gene 
family AVR-Pita within the M. oryzae population has 
been attributed to the retrotransposons Inago-1 and 
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Inago-2, which flank AVR-Pita. These are thought to be 
involved in multiple translocation events of AVR-Pita, 
thereby facilitating a cycle of loss and gain of recognition 
by rice cultivars encoding the cognate Pita resistance 
protein [12]. Next to retrotransposons, some DNA 
transposons have also been observed proximal to fungal 
effector genes. In Leptosphaeria maculans, putative 
effectors are clustered in AT-blocks together with three 
significantly over-represented TEs (one class I and two 
class II) [11]. Clustering of virulence genes might pro- 
vide a selective advantage, because all captured genes 
experience the same genomic environment, e.g. an open 
or closed chromatin structure, thereby being simultan- 
eously amenable for transcriptional regulation [50]. This 
might facilitate coordinated gene expression during 
plant infection. 

MITEs and Fol evolution 

Mimps are always found within 1500 bp upstream the 
translation start site of SIX genes as well as upstream of 
several other genes that are expressed during plant in- 
fection. Mimps are uniformly small in size, ranging from 
200 - 220 bp. Their central region has no coding 
capacity and is flanked by -27 bp TIRs that resemble 
the TIRs of the Tcl/mariner transposase Impala [51]. 
Impalas have been shown to transpose mimps in a heter- 
ologous system [52]. However, none of the Impala copies 
in the Fol genome are intact, suggesting that mimps are 
not currently transposed in Fol4287. In the past there 
appear to have been several bursts of mimp amplifica- 
tion resulting in at least six mimp subfamilies present in 
Fol [51]. Strikingly, more than half of the mimps in 
Fol4287 are present on the pathogenicity chromo- 
some and the other mimps, with four exceptions, are 
restricted to the LS regions (Table 3) [51]. 

mFot5s reside downstream of the SIX1/SSH1/SIX2, the 
SIX3/SIXS and the SIX10/SIX12/SIX7 mini-clusters as 
well as downstream of the solo SIX9 gene (Figure 3, 
Additional file 1). mFot5 is also part of the putative sec- 
ondary metabolite cluster that is co-expressed during 
Fol infection of tomato plants (Figure 3, Additional file 
2). Downstream of SIX11 is no mFot5, but a full-length 
Fot5. The same is true for ORX1, which encodes an 
oxidoreductase that is secreted by Fol during tomato in- 
fection. mFot5 is a pogo-like MITE, less than 500 bp 
long with TIRs similar to those of the Fot5 transposon. 
In contrast to the lack of intact Impalas for mimp trans- 
position, Fol4287 possesses around 64 intact Fot5 
transposase ORFs that could mobilize mFot5s [41]. 

What could be the function of mimps in promoters of 
effector genes? 

Strikingly, mimps are not only present in SIX gene 
promoters, but also in the promoters of several other 



genes that are expressed during plant infection. Among 
these are the gene for the oxidoreductase Orxl and two 
genes of the presumptive secondary metabolite gene 
cluster. One possible scenario is that the mimp is a 
domesticated TE, which has adopted a function as tran- 
scription factor binding site, perhaps for Sgel, the tran- 
scription factor regulating SIX gene expression [26]. We 
tested this by deleting fragments of varying length in the 
promoter of SIX1 and the bidirectional, shared promoter 
of SIX3 and SIX5. In a strain in which the mimp in the 
promoter of SIX 1 was deleted (Fo/4287SIXlpll89), SIX1 
expression in vitro and in planta was the same as in wild 
type. Likewise, SIX3 and SIX5 expression was not 
affected in a strain in which the mimp was absent in 
their shared upstream region (Fo/4287SIX3p859). There- 
fore, we can rule out a direct involvement of the mimp 
in transcriptional regulation of SIX gene expression. 

We did, however, observe that the presence or absence 
of other promoter regions affect gene expression at the 
SIX1 and the SIX3/SIX5 locus. By comparing two differ- 
ent promoter deletions, we found that SIX1 expression 
in planta requires a 41 bp region that includes one of 
the conserved TCGGCA elements that we found to be 
enriched in the SIX gene promoters (Figure 5). In con- 
trast, SIX3 and SIX5 are not expressed from the two 
shorter promoter deletion strains, but expression of both 
genes is restored in the strain with the longest promoter 
deletion. The longest deletion additionally includes one 
of the TCGGCA elements (Figure 6A), which in this in- 
stance may mediate the action of a transcriptional re- 
pressor. The association of this element with upstream 
regions of effector genes is statistically significant (see 
Methods and Additional file 3 for details). Also, a perfect 
match to the extended motif (AAGTCGGCAGT) is 
present in the upstream regions of three genes encoding 
enzymes that we found in our analysis of the xylem sap 
proteome: FOXG_11769 on chromosome 10, encoding a 
glycosyl hydrolase and the closely related FOXG_14234 
on the pathogenicity chromosome and FOXG_17180 on 
an unpositioned scaffold, encoding a peroxidase- 
catalases. Nevertheless, the function of this putative 
regulatory sequence remains to be established. 

Interestingly, SIX1 as well as SIX3 and SIX5 were 
weakly expressed in all promoter deletion strains 
in vitro, but not in planta. In absence of a plant host, 
SIX gene expression is usually very low [26], while it is 
strongly induced upon plant infection [53]. SIX genes 
are only needed during plant infection; therefore the 
fungus might actively suppress SIX gene expression in 
the absence of a plant host. One way of suppressing gene 
expression is by modification of chromatin to a repres- 
sive, closed state. Repressive chromatin structures often 
involve histone modifications such as H3K9 methylation 
[54]. One origin of such repressive chromatin structures 
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is TE silencing, often guided by small RNAs transcribed 
from the TE [55]. In plants of the Solanaceae family, 
MITEs proximal to R gene loci were shown to produce 
small RNAs that are recruiting the histone methylation 
machinery for TE silencing resulting in the formation of 
closed chromatin [56]. TE silencing of the MITEs 
surrounding the SIX genes might likewise create a re- 
pressive chromatin environment, which may serve as 
a first layer of SIX gene regulation. Upon stress, such 
as during plant infection, TEs might be derepressed 
as shown for the oomycete pathogen Phythophthora 
mmorum [57], thus creating an open chromatin struc- 
ture. Binding of transcriptional activators or repressors 
would be possible in an open chromatin state and pro- 
vide the basis for a second layer of regulation of SIX 
gene expression. 

Identification of novel effector candidates 

We identified eight novel (candidate) effector genes 
based on the presence of a mimp in their promoters 
and/or the presence of their protein product in xylem 
sap of infected plants. Five of these genes {SIX8b, SIX9, 
SIX11, SIX13, SIX14) were identified by the in silico 
search and validated by the analysis of xylem sap from 
infected tomato plants. Like the previously identified 
SIX1 and SIX7 genes, SIX10 escaped the in silico identifi- 
cation due to a sequencing gap close to its promoter. 
SIX12 encodes an unusual effector lacking a recog- 
nizable N-terminal signal peptide for secretion via the 
classic Endoplasmatic-Reticulum/Golgi route. Neverthe- 
less, the Sixl2 protein is present in the xylem sap of 
infected tomato plants and therefore might be secreted 
via an unconventional protein secretion route [58]. 
SIX13 encodes the only effector known so far that is 
located on a LS chromosome other than the pathogen- 
icity chromosome. We also identified FoAVEl as a gene 
harboring a mimp in its promoter, but we did not detect 
the FoAvel protein in the xylem sap nor detected 
FoAVEl mRNA in infected plants (results not shown). 
Apparently, in the strains used here FoAVEl is not 
expressed during infection, although it was shown to be 
able to elicit Ve 1 -mediated resistance in a heterologous 
system [59]. FoAvel might be part of a silent effector 
reservoir together with the other three genes that en- 
code small, secreted proteins and harbor a mimp in their 
promoters, but are not expressed during infection. 

Some of the genes we identified here have been subject 
to gene or segmental duplications. ORX1 is present in 
two similar but not identical copies on the pathogenicity 
chromosome (FOXG_14258, FOXG_14236; Additional 
file 1). Two other genes, FOXG_17460 on the pathogen- 
icity chromosome and FOXG_17130 on chromosome 6, 
both encode a metalloprotease. Apart from a missing 3' 
end of FOXG_17460 due to a sequencing gap, the two 



genes and their promoters are identical, indicating a re- 
cent duplication event. SIX13 is also duplicated. In both 
cases the duplicated gene copies do not harbor a mimp 
in their promoter. SIX8b is present in four identical cop- 
ies due to an intra- and interchromosomal segmental 
duplication within and between chromosome 3 and 6 
[1]. This duplicated chromosomal segment corresponds 
to another small chromosome that can be transferred 
horizontally from the strain Fol007 [1]. Progeny strains 
possessing both the pathogenicity chromosome and the 
other small chromosome are more aggressive towards 
tomato than progeny strains with only the pathogenicity 
chromosome. At present, we do not know which gene(s) 
on the small chromosome (corresponding to scl8 in the 
Fol4287 genome) contributes to pathogenicity towards 
tomato - Six8b was not found in the xylem sap 
proteome. 

In summary, mimps are associated with the promoters 
of all small in planta secreted proteins, as well as several 
enzymes. Our strategy for in silico detection of effector 
genes in F. oxysporum is limited by three factors: 1) im- 
perfect conservation of the IRs of a mimp, 2) sequencing 
gaps in the genome assembly and 3) absence of a canon- 
ical N-terminal signal peptide for secretion. The impact 
of first two factors may be alleviated by more advanced 
methods for mimp detection and genome assembly. The 
third factor, absence of a canonical signal peptide, can be 
either due to secretion via an unconventional route or to 
a failure of SignalP to predict a signal peptide, as was 
the case for SIXS or SIX8. In the latter case, incorpor- 
ation of gene structure (intron/exon) predictions or 
transcript sequences will be helpful. Overall, our 
approach presents a powerful tool to predict novel 
effectors and other virulence factors in F. oxysporum. 

Conclusions 

Class II TEs are much less evenly distributed over the 
Fol pathogenicity chromosome than class I TEs. Effector 
genes reside as single genes or mini-gene clusters within 
class II TE-enriched chromosomal subregions. Two 
MITEs are closely associated with effector genes. A (par- 
tial) mimp is always present in effector gene promoter 
regions and a mFot5 is frequently present downstream 
of the effector gene mini-clusters. We could exclude a 
direct involvement of the mimp in effector gene expres- 
sion by making promoter deletion strains for two ef- 
fector gene loci followed by gene expression analysis and 
tomato pathogenicity assays. Overall, the unique associ- 
ation of effector genes and mimps allowed us to develop 
a method to successfully predict candidate effector 
genes. For most of these genes, the corresponding pro- 
tein was found by mass spectrometry in the xylem sap 
of tomato during Fol infection. Our method can easily 
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be extended to predict novel effector genes in Fo strains 
with different host specificities. 

Methods 

Plant lines and fungal strains 

The following tomato (Solatium lycopersicum) lines were 
used {Fol resistance genes between brackets): 90E402F 
(1-1) [60,61]; 90E341F (1-2) [62] and E779 (1-3) [60], C32 
(no I gene) [63]. The following Fol strains were used: 
Fol007 (race 2), Fol4287 (race 2), Fol004 (racel), 
Fo/42S7SIXlpll89, Fo/4287SIXlpl230, Fol4287SlX- 
c3p539, Fol4287SlX3p807, Fol4287SlX3p859, Fol4287A- 
sgel, Fol007Asixl. See Rep et al. [45] for a more de- 
tailed description of the wild type Fol strains. 

Identification and annotation of TEs 

Repetitive DNA elements were identified by performing 
a self-BLASTN against the Fol4287 genome sequence, 
then using a custom PERL script (Amyott S.G. et al.,. 
manuscript in preparation), which identifies multi-copy 
sequences and sorts these repeated sequences into 
non-redundant families. Additional TEs with terminal 
inverted repeats were identified by search the Fol4287 
genome for inverted repeats of at least 19 bp 
encompassing at most 5 kb of sequence. Blast was used 
to find all instances (full length or partial) in the Fol4287 
genome. Additional file 5 contains prototypes for all 
newly identified TEs. 

Promoter deletion constructs 

The promoter deletion constructs for the SIX1 and 
SIX3/SIXS promoters were made by PCR amplification 
sequences of the sequences flanking the part of the pro- 
moter that was to be deleted for homologous recombin- 
ation, and their insertion in front of and behind the 
hygromycin resistance gene in the vector pRW2h (see 
below). For SIX1: for both deletion constructs a 829 bp 
upstream fragment was cloned into pRW2h [64] be- 
tween the Pad and Acc65l sites and al093 bp and 1052 
bp downstream fragment, for the SIXlpll89 construct 
and the SIXlpl230 construct respectively, were cloned 
into pRW2h between at the Xbal site. For SIX3: a 1001 
bp upstream fragment was cloned into pRW2h between 
the Pad and Acc651 sites and a 1332 bp (SIX3p539), a 
1064 bp (SIX3p807) and a 1012 bp (SIX3p807) down- 
stream fragment was cloned into the Xbal site of 
pRW2h. Transformation of these constructs to Fol4287 
was done with Agrobacterium as described earlier [65]. 

Tomato disease assay 

Ten days old tomato seedlings were inoculated with a 
fungal spore suspension and disease was scored after 
three weeks as described earlier [49]. The outcome of 
the disease assays was quantified in two ways: 1) average 



plant weight above the cotyledons and 2) phenotype 
scoring according to a disease index ranging from zero 
(no disease) to four (heavily diseased or dead) [49]. 

Fol gene expression analysis 

For in vitro expression analysis, Fol mycelium was 
harvested after three days growth at 25°C and 175 rpm 
in minimal growth medium (3% sucrose, 1% KN03 and 
0.17% yeast nitrogen base without amino acids and am- 
monia). For in planta expression analysis, ten days old 
tomato seedlings were inoculated with fungal spores 
suspensions as described above and roots were sampled 
eight or nine days after inoculation. From the collected 
material, RNA was isolated using TRIzol reagent (Gibco) 
followed by phenol-chloroform extraction. The isolated 
RNA was used to make cDNA using Promega Rnasin 
(ribonuclease inhibitor) and Gibco Superscript II RNaseH 
Reverse transcriptase according to the manufacturer's 
instructions. Primers used for RT-PCR analysis are listed 
in Additional file 6. 

Identification of novel effector candidates 

Based on published sequences of prototypes of mimpl-4 
as well as mimps present in promoters of SIX1-7, a 
consensus mimp 3' IR was defined as 'TT[TA] 
TTGCNNCCCACTG'. A PERL script was used to find 
instances of this pattern in the genome sequence of 
Fol4287, downloaded from the broad website (http:// 
www.broadinstitute.org/ annotation/ genome/ 
fusarium group/MultiHome.html). For 150 of the 158 
matches to this pattern, the next dinucleotide was 'TAJ 
which is the required target site for mimps and Impalas. 

For each mimp IR match, all open reading frames 
(ORFs) starting with an ATG and of at least 25 codons 
within 2000 bp downstream of the IR were selected. The 
ORFs were translated and the translation products sub- 
mitted to signal peptide prediction by SignalP (http:// 
www.cbs.dtu.dk/services/SignalP/). If positive, the in- 
stance was recorded (mimp IR sequence, translation 
product of ORF and their positions in the Fol4287 
contig). The sequence surrounding this instance was 
retrieved and manually inspected to define the full ORF 
of the candidate effector gene. 

In silico promoter analysis 

To find potential regulatory elements in promoters of 
effector genes, we first identified enriched k-mers in the 
concatenated upstream regions of SIX1, SIX2, SIX3, 
SIXS, SIX6 and SIX7, using Compseq (http://emboss.bio- 
informatics.nl/cgi-bin/emboss/compseq). As upstream 
regions we used here the sequences between the up- 
stream mimp and the ATG, to avoid identification of 
sequences within mimps (especially the conserved 
inverted repeats). We looked for enriched 6mers, 7mers, 
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8mers and 9mers in both strands. Among the most fre- 
quent 6mers and 7mers, we found two classes: (1) A di- 
versity of AT-sequences and (2) a small set of 
overlapping sequences that were present in one or more 
instances in all - or all but one - upstream regions. The 
most frequent sequence elements of the second class 
were the 6mers TCGGCA (16), GGCAGT (14), 
CGGCAG (11) and GCAGTT (11) and the 7mers 
GGCAGTT (11), TCGGCAG (9) and CGGCAGT (7). 
The overlap of these 6mers and 7mers is the 9mer 
TCGGCAGTT. This is also the most frequent 9mer, 
which occurs 6 times in the upstreams regions, namely 
in those of SIX1 (2X), SIX3 (IX) and SIX5 (3X). Except 
two palindromic AT-rich sequences (TTTTAAAA and 
TAT AT ATA), the most frequent 8mers matched this 
9mer: TCGGCAGT (6), CGGCAGTT (7), or extend it: 
GGCAGTTA (6). Two other frequent 8mers extend the 
sequence on the other end: AAGTCGGC (4) and 
AGTCGGCA (4). Additional overlapping, enriched 
9mers (each occuring 3 times) further extend the 
combined sequence to the consensus AAGTCGG- 
CAGTT[AG]A. 

To assess the significance of the occurrence of this 
motif in the upstream regions of effector genes, we 
analysed the 17708 upstream regions of Fol4287 genes, 
defined as 1000 bp upstream of the predicted transla- 
tional start codon. This analysis is summarized in 
Additional file 3. Briefly, we calculated the probability 
that the frequency with which the two most frequent 
(and overlapping) 6mers, TCGGCA and GGCAGT, and 
to the most frequent 7mer, GGCAGTT (a one base ex- 
tension to the second 6mer), occur at least once or twice 
in the upstream regions of effector genes is by chance 
association. We did this both for the original set of ef- 
fector genes used to find the pattern (SIX1-3 and SIX5- 
7), and for the entire set of identified effector genes (in- 
cluding SIX8b). All p values were lower than 0.05. The 
weakest association was between at least one TCGGCA 
element and the original set (p = 0.024) and between at 
least one GGCAGT and the entire set (p = 0.015). Asso- 
ciation with at least twice occurences were more signifi- 
cant in all cases. Association with the entire set was 
slightly more significant for the TCGGCA element 
(at least once or at least twice) and for at least twice 
occurences of the GGCAGT element. The other 
associations were weaker with the entire set of effector 
genes. 

Xylem sap collection, mass spectrometry and label free 
quantitative proteomics 

Fol007 was used for tomato inoculation. Four-week-old 
tomato plants C32 were inoculated, after removing part 
of the root system, with a Fol spore suspension (5 x 10 6 
spores mL -1 ) or with water as a negative control, and 



potted. Fourteen days post inoculation (dpi), xylem sap 
was collected as described [66,67]. Briefly, stems were 
cut below the second true leaf and the plant was placed 
in a horizontal position. Then, for minimal 6 h sap 
bleeding from the cut surface was collected in tubes 
placed on ice. The collected xylem sap was stored 
at -20°C. 

For label-free protein quantification 25 plants per in- 
oculum were inoculated with Fol007 or water. Xylem 
sap was isolated as described above from four independ- 
ent biological replicates. A fraction of the sap was used 
for immunoblotting, the remainder was concentrated 
with a Centricon plus-70 (Millipore) unit to a final vol- 
ume of 200-300 ul. The protein concentration was 
determined with the bicinchoninic acid method (Sigma). 
After trichloroacetic acid/aceton precipitation protein 
isolated from inoculated plants with water or Fol007 was 
dissolved in sample buffer at equal concentration (1.5 
ug/ul) and 30 ul per sample was loaded on the SDS- 
PAGE. SDS-PAGE was performed with Hoefer Mighty 
Small SE250 minigel equipment (Amersham Biosciences, 
AB, Uppsala). After a short run, the Coommassie 
PageBlue'" (Fermentas) was used to visualize the proteins 
in the SDS-PAGE. For each xylem sap sample one 
gel slice containing all proteins was cut from the 
Coomassie-stained gel. In-gel digestion was performed 
as described by Rep et al. [67]. The peptides obtained 
after this digestion were analyzed by nanoLC-MS/MS as 
described by Lu, et al [68]. Raw data from the LTQ- 
Orbitrap were analyzed with MaxQuant software [69,70] 
to identify the proteins and allow label-free relative 
quantification. MaxQuant 1.1.36 settings were used 
according to the description by Peng, et al [71]. The Fol 
protein database used for the analysis was obtained from 
Fusarium Comparative Genome website (http://www. 
broadinstitute.org/annotation/genome/fusarium_group/ 
MultiHome.html) and supplemented by adding the 
sequences of known Six proteins that are not annotated 
in the public database. A "contaminant" database was 
used that contains proteins such as trypsin and human 
keratins [71]. Bioinformatics analysis of the MaxQuant 
workflow and the statistical analysis of the abundances 
of the identified proteins were performed using Perseus 
(available at www.MaxQuant.org) [70]. Only proteins 
identified with at least two peptides, of which one 
should be unique, were kept. 

Additional files 



Additional file 1: Detailed annotation of the Fol4287 pathogenicity 
chromosome. 

Additional file 2: A putative secondary metabolite gene cluster of 
Fol is expressed during tomato infection. Roots of ten days old 
susceptible (without resistance genes) tomato seedlings were inoculated 
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with conidiospores of Fol004. Roots were harvested 8 dpi (days post 
inoculation). From the collected roots RNA was extracted and (RT-) PCR 
was performed to detect transcripts of the indicated genes. Numbers 
represent FOXG numbers of the Fol4287 reference genome. Marker sizes 
are indicated on the right. C: cDNA, G: genomic DNA. 

Additional file 3: Significance of the association between the 
TCGGCA element and upstream regions of effector genes. 

Additional file 4: Complex repeat structure in SIX8, SIX8b and SIX14 
upstream regions. The most upstream sequence shared between the 
51X8 and SIX8b loci (dark grey, blue and green highlighted) is more similar 
between SIX8 and SIX8b loci than the coding sequences and the 
immediate upstream sequences (light grey). The SIX8b upstream region is 
the most complex. Compared to that of SIX8, there are: (a) a mimp4 
insertion, (b) a Han insertion, (c) an inversion and duplication (indicated 
with < signs), (d) a mimpl insertion, (e) a partial mimp3 and (f) an extra 
sequence that includes an mFot5. A total of 9 mimp-related inverted 
repeats are present, of which two are interrupted by a TE. Part of the 
SIX14 upstream region is almost identical to a part of the SIX8b upstream 
region (green/blue highlighted including the mimp4) - except that the 
Han insertion is missing in the SIX14 locus. In both cases, a mimpl is 
present immediately downstream of this region but, though similar in 
sequence, these mimpl insertions appear to be independent. Blue 
capital letters: effector ORF (introns in lower case); Green capital letters: 
mimp; Dark red capital letters: mFot5; Orange capital letters: Han; Gray 
highlight: shared between SIX8 and SIX8b loci only; Light gray highlight: 
similarity between SIX8 and SIX8b upstream (leader/promoter) sequences; 
Blue highlight: mimp-like inverted repeat sequence, present one or more 
times in SIX8, SIX8b and SIX14 loci (numbers of likely orthologous 
sequences correspond between the three loci - note that mimp-IRI does 
not conform to the consensus sequence for mimp inverted repeats); 
Green and dark green highlight: sequences present one or more times in 
SIX8, SIX8b and SIX14 loci; Yellow highlight: TGCCGA motif; Bold: target 
site duplications associated with TE insertions. 

Additional file 5: Newly identified TEs of Fol. 

Additional file 6: Primers used in this study. 
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